Skip to content

[Nightly] Enhance XPU test workflows #1723

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 44 commits into
base: main
Choose a base branch
from

Conversation

mengfei25
Copy link
Contributor

@mengfei25 mengfei25 commented Jun 6, 2025

  1. remove source oneapi in microbench to use pypi packages
  2. fix windows test workflow issue in nightly
  3. modify torchbench installation to reduce reinstalling torch
  4. extended xpu ops ut timeout to 5 hours
  5. setup test env via common script
  6. cleanup workspace before launch tests
  7. build torchvision torchaudio together if build torch from source code

@mengfei25 mengfei25 requested a review from chuanqi129 June 6, 2025 01:48
@@ -85,6 +85,17 @@ rm -rf ./tmp
bash third_party/torch-xpu-ops/.github/scripts/rpath.sh ${WORKSPACE}/pytorch/dist/torch*.whl
python -m pip install --force-reinstall tmp/torch*.whl

# Build torchvision torchaudio
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use a env var or parameter control this behavior, by default off unless the var/parameter has been set to on.

Including triton build also. For pinned triton commit, we can use make triton directly under pytorch root dir. For customized triton commit, we can build by ourselves or leverage the scripts directly https://github.com/pytorch/pytorch/blob/main/.github/scripts/build_triton_wheel.py and refer usage https://github.com/chuanqi129/pytorch/blob/fix_triton_version_split/.github/workflows/build-triton-wheel.yml#L158-L160. Before this step, we need replace the pined triton xpu commit file content to customized one.

cc: @RUIJIEZHONG66166

@@ -60,6 +60,9 @@ jobs:
HF_HUB_ETAG_TIMEOUT: 120
HF_HUB_DOWNLOAD_TIMEOUT: 120
steps:
- name: Cleanup Workspace
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why we need this change? If this test failed, it should caused by other jobs don't clean workspace at the end of the workflow

- name: Checkout torch-xpu-ops
uses: actions/checkout@v4
- name: Prepare Stock Pytorch
- name: Cleanup Workspace
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here. And suggest to use a github action or define a github action directly for the workspace cleanup, such as https://github.com/pytorch/pytorch/blob/main/.github/workflows/_xpu-test.yml#L338 or https://github.com/pytorch/pytorch/blob/main/.github/workflows/_xpu-test.yml#L83

@@ -0,0 +1,87 @@
#!/bin/bash

set -xe -o pipefail
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like this script is over design somehow. Add some suggestions

else
pip install torch torchvision torchaudio --pre --index-url https://download.pytorch.org/whl/nightly/xpu
TORCH_COMMIT_ID=$(python -c 'import torch; print(torch.version.git_version)')
PYTORCH_VERSION="main"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seem the main has ambiguity here, if we don't test main branch from source code build

python pytorch/torch/utils/collect_env.py
rm -rf /tmp/torchinductor_*
rm -rf ~/.triton/cache
./.github/scripts/setup_test_env.sh \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Commented in the scripts file directly

elif [ "${PYTORCH_VERSION}" == "nightly" ];then
python -m pip install torch torchvision torchaudio --pre --index-url https://download.pytorch.org/whl/nightly/xpu
else
python -m pip install ${WORKSPACE}/torch*.whl
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will this step include torchvision and torchaudio installation?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes

@mengfei25 mengfei25 marked this pull request as draft June 17, 2025 01:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants